Geospatial mapping with Python¶

Marco Chierici

June 1, 2022

(partially abridged from The Python Graph Gallery, Plotly doc, Folium doc)

We give here a short and partial overview of three main methods to deal with geospatial mapping in Python, with a focus on choropleth maps:

  • GeoPandas & geoplot - conda install -c conda-forge geopandas, conda install -c conda-forge geoplot
  • Plotly - conda install -c plotly plotly=5.8.0
  • Folium - conda install -c conda-forge folium
In [1]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# don't display warning messages
warnings.filterwarnings(action='ignore')

# show plots inline and make them interactive
%matplotlib inline

Ingredients¶

A map is a set of polygon coordinates displayed on a 2D canvas. So, first you need to get those polygons! Common sources are:

  • a shapefile (shp) or GeoJSON file - geopandas is the preferred way to import these files into Python
  • a Python library (geopandas again, or basemap) with the information for common locations or regions (Europe, USA, etc.)
  • OpenStreetMap or other similar tile providers.

Now that you have a set of polygons, you can plot it using different approaches. For example:

  • geoplot, if you have your data into a geopandas' geodataframe;
  • plotly, if you want or need an interactive map from a geodataframe;
  • folium, if you want to use Google Map styled maps.

Choropleth maps¶

A choropleth map is a map composed of colored polygons, conveying spatial variations of a quantity. To create them, you'll need:

  1. Geometry information (GeoJSON, SHP, etc.) - see above
  2. A list of values indexed by a feature identifier.

GeoPandas¶

GeoPandas extends the datatypes used by pandas to support spatial operations on geometric types.

We start by importing a geoJSON file with the state boundaries of France.

In [2]:
import geopandas as gpd
In [3]:
data = gpd.read_file("https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/france.geojson")
data.crs
Out[3]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

data is a geodataframe, where each row represents an item in the geoJSON file - in this case, a region of France. The columns describe the features of each region: in particular, the geometry column stores the coordinates of the region boundary.

In [4]:
print(type(data))
<class 'geopandas.geodataframe.GeoDataFrame'>

The most common and straightforward way to plot a map from a geopandas dataframe is with geoplot.

In [5]:
import geoplot as gplt
import geoplot.crs as gcrs # load projections

gplt.polyplot(data, projection=gcrs.AlbersEqualArea(), 
                 edgecolor='darkgrey', facecolor='lightgrey', 
                 linewidth=.3,
                 figsize=(12, 8))
plt.show()

Now that we have learned how to draw a map combining GeoPandas and geoplot, let's create a choropleth map to visualize the unemployment rate of each US county.

In [6]:
# Load the json file with county coordinates
geoData = gpd.read_file('https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/US-counties.geojson')
In [7]:
# Make sure the "id" column is an integer
geoData.id = geoData.id.astype(str).astype(int)
geoData
Out[7]:
id GEO_ID STATE COUNTY NAME LSAD CENSUSAREA geometry
0 1001 0500000US01001 01 001 Autauga County 594.436 POLYGON ((-86.49677 32.34444, -86.71790 32.402...
1 1009 0500000US01009 01 009 Blount County 644.776 POLYGON ((-86.57780 33.76532, -86.75914 33.840...
2 1017 0500000US01017 01 017 Chambers County 596.531 POLYGON ((-85.18413 32.87053, -85.12342 32.772...
3 1021 0500000US01021 01 021 Chilton County 692.854 POLYGON ((-86.51734 33.02057, -86.51596 32.929...
4 1033 0500000US01033 01 033 Colbert County 592.619 POLYGON ((-88.13999 34.58170, -88.13925 34.587...
... ... ... ... ... ... ... ... ...
3216 51001 0500000US51001 51 001 Accomack County 449.496 MULTIPOLYGON (((-75.24227 38.02721, -75.29687 ...
3217 51021 0500000US51021 51 021 Bland County 357.725 POLYGON ((-81.22510 37.23487, -81.20477 37.243...
3218 51027 0500000US51027 51 027 Buchanan County 502.763 POLYGON ((-81.96830 37.53780, -81.92787 37.512...
3219 51037 0500000US51037 51 037 Charlotte County 475.271 POLYGON ((-78.44332 37.07940, -78.49303 36.891...
3220 51041 0500000US51041 51 041 Chesterfield County 423.297 POLYGON ((-77.85180 37.35487, -77.85515 37.418...

3221 rows × 8 columns

In [8]:
# Remove Alaska, Hawaii and Puerto Rico.
stateToRemove = ['02', '15', '72']
geoData = geoData[~geoData.STATE.isin(stateToRemove)]

# Basic plot with just county outlines
gplt.polyplot(geoData, figsize=(20, 4))

plt.show()

To build a choropleth map, we need numeric data to link to the map: we load the unemployment rate of US counties from the Bureau of Labor Statistics (source).

In [9]:
data = pd.read_csv('https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/unemployment-x.csv')
data.head()
Out[9]:
id state county rate
0 1001 Alabama Autauga County 5.1
1 1003 Alabama Baldwin County 4.9
2 1005 Alabama Barbour County 8.6
3 1007 Alabama Bibb County 6.2
4 1009 Alabama Blount County 5.1
In [10]:
# plot a histogram of unemployment rate
sns.distplot(data["rate"], hist=True, kde=False, rug=False );

Now we can merge both sources of data (map and unemployment rates) in order to visualize them together. We'll be using the merge() function of geopandas:

In [11]:
fullData = geoData.merge(data, left_on=['id'], right_on=['id'])
fullData.head()
Out[11]:
id GEO_ID STATE COUNTY NAME LSAD CENSUSAREA geometry state county rate
0 1001 0500000US01001 01 001 Autauga County 594.436 POLYGON ((-86.49677 32.34444, -86.71790 32.402... Alabama Autauga County 5.1
1 1009 0500000US01009 01 009 Blount County 644.776 POLYGON ((-86.57780 33.76532, -86.75914 33.840... Alabama Blount County 5.1
2 1017 0500000US01017 01 017 Chambers County 596.531 POLYGON ((-85.18413 32.87053, -85.12342 32.772... Alabama Chambers County 5.0
3 1021 0500000US01021 01 021 Chilton County 692.854 POLYGON ((-86.51734 33.02057, -86.51596 32.929... Alabama Chilton County 5.2
4 1033 0500000US01033 01 033 Colbert County 592.619 POLYGON ((-88.13999 34.58170, -88.13925 34.587... Alabama Colbert County 6.5

We are happy that geoplot provides a choropleth() function!

choropleth(df, projection=None, hue=None, cmap=None, scheme=None)

The hue parameter expects the name of the column we want to use to control the color of each county. Then, we have to pick a suitable color palette (cmap) and a binning scheme (scheme). Geoplot comes with both continuous and categorical binning schemes, i.e. methods that split a sequence of observations into a number of bins.

In [12]:
fig, ax = plt.subplots(1, 1, figsize=(16, 12))

# set up the binning sheme
import mapclassify as mc
scheme = mc.Quantiles(fullData['rate'], k=10)

# map
gplt.choropleth(fullData, 
    hue="rate", 
    linewidth=.1,
    scheme=scheme, cmap='inferno_r',
    legend=True,
    edgecolor='black',
    ax=ax
)

ax.set_title('Unemployment rate in US counties', fontsize=13)
Out[12]:
Text(0.5, 1.0, 'Unemployment rate in US counties')

Plotly¶

Plotly is a open-source graphing library aimed at interactivity for Python, R, and other languages. It can be used offline and does not require any account registration (Plotly.com has also paid licenses for enterprise users).

We'll create a choropleth map of the US unemployment rate with Plotly, this time using a slightly different approach for gathering the data.

We load a GeoJSON file with the geometry information for US counties:

In [13]:
from urllib.request import urlopen
import json

with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

counties["features"][0]
Out[13]:
{'type': 'Feature',
 'properties': {'GEO_ID': '0500000US01001',
  'STATE': '01',
  'COUNTY': '001',
  'NAME': 'Autauga',
  'LSAD': 'County',
  'CENSUSAREA': 594.436},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-86.496774, 32.344437],
    [-86.717897, 32.402814],
    [-86.814912, 32.340803],
    [-86.890581, 32.502974],
    [-86.917595, 32.664169],
    [-86.71339, 32.661732],
    [-86.714219, 32.705694],
    [-86.413116, 32.707386],
    [-86.411172, 32.409937],
    [-86.496774, 32.344437]]]},
 'id': '01001'}

Each feature.id is a FIPS code (see one of our previous R labs).

Next, we load unemployment data by county, also indexed by FIPS code.

In [14]:
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv",
                   dtype={"fips": str})
df.head()
Out[14]:
fips unemp
0 01001 5.3
1 01003 5.4
2 01005 8.6
3 01007 6.6
4 01009 5.5

We are ready for the plotting. We'll use Plotly express, Plotly's high-level API for creating figures. As it was for geoplot, we can use a very convenient choropleth() function.

In [15]:
import plotly.express as px

fig = px.choropleth(df, geojson=counties,
                    locations='fips',
                    color='unemp',
                    color_continuous_scale="Viridis",
                    range_color=(0, 12),
                    scope="usa",
                    labels={'unemp': 'unemployment rate'}
                   )

fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0})
fig.show()
In [16]:
import plotly.express as px

fig = px.choropleth(df, geojson=counties,
                    locations='fips',
                    color='unemp',
                    color_continuous_scale="Viridis",
                    range_color=(0, 12),
                    scope="usa",
                    labels={'unemp': 'unemployment rate'}
                   )

fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0})

# better legend
fig.update_layout(coloraxis_colorbar=dict(
    thicknessmode="pixels", thickness=10,
    lenmode="pixels", len=150,
    yanchor="top", y=0.8,
    ticks="outside", ticksuffix=" %",
    dtick=5
))

fig.show()

Especially if you work outside Jupyter notebooks, you can save the output plot as a standalone HTML file:

In [17]:
fig.write_html("plotly_choropleth_unemp.html")

You can then embed the HTML map in any HTML document using an iframe:

<iframe src="plotly_choropleth_unemp.html" title="Plotly choropleth map" style={{ border: "none", width: "800px", height: "300px" }}></iframe>

Folium¶

Folium is a Python wrapper of the leaflet.js JavaScript library, putting together the best of the two worlds: Python's data processing capabilities and JavaScript's interactive data visualization.

With Folium we can create a map with just one line of code (two if you count the import):

In [18]:
import folium
map = folium.Map(location=[47.6151, -122.3413], zoom_start=13)
map
Out[18]:
Make this Notebook Trusted to load map: File -> Trust Notebook

As you see, by default Folium uses OpenStreetMap as tile provider.

The map can be saved to a standalone HTML file:

In [19]:
map.save("folium_basic_map.html")

You can use several options for the background tile using the tiles parameter. For example, let's take a look at Paris with a Stamen Toner or Stamen Terrain tile:

In [20]:
tonerMap = folium.Map(location=[48.85, 2.35], tiles="Stamen Toner", zoom_start=10)
tonerMap
Out[20]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [21]:
tonerMap = folium.Map(location=[48.85, 2.35], tiles="Stamen Terrain", zoom_start=10)
tonerMap
Out[21]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Adding markers to a Folium map is as easy as creating a Pandas dataframe storing their coordinates.

In [22]:
# Make an empty map
m = folium.Map(location=[20, 0], zoom_start=2)
m
Out[22]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [23]:
# Make a data frame with dots to show on the map
data = pd.DataFrame({
   'lon':[-58, 2, 145, 30.32, -4.03, -73.57, 36.82, -38.5],
   'lat':[-34, 49, -38, 59.93, 5.33, 45.52, -1.29, -12.97],
   'name':['Buenos Aires', 'Paris', 'melbourne', 'St Petersbourg', 'Abidjan', 'Montreal', 'Nairobi', 'Salvador'],
   'value':[10, 12, 40, 70, 23, 43, 100, 43]
}, dtype=str)

data
Out[23]:
lon lat name value
0 -58 -34 Buenos Aires 10
1 2 49 Paris 12
2 145 -38 melbourne 40
3 30.32 59.93 St Petersbourg 70
4 -4.03 5.33 Abidjan 23
5 -73.57 45.52 Montreal 43
6 36.82 -1.29 Nairobi 100
7 -38.5 -12.97 Salvador 43
In [24]:
# add marker one by one on the map
for i in range(0,len(data)):
   folium.Marker(
       location=[data.iloc[i]['lat'], data.iloc[i]['lon']],
       popup=data.iloc[i]['name']
   ).add_to(m)

# Show the map again
m
Out[24]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Time to create a choropleth map of the US unemployment rate with Folium. First, we grab the data: note that this time we only need a valid path / URL for the geoJSON with geometry and we do not need to read it in.

In [25]:
# geometry data
url = "https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
state_geo = f"{url}/us-states.json"
# unemployment data
state_unemployment = f"{url}/US_Unemployment_Oct2012.csv"
state_data = pd.read_csv(state_unemployment)

Next, we initialize a map setting tile and location:

In [26]:
m = folium.Map(location=[40, -95], zoom_start=4)
m
Out[26]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Finally, we create a folium.Choropleth object and we add it to our map:

In [27]:
folium.Choropleth(
    geo_data=state_geo,
    name="choropleth",
    data=state_data,
    columns=["State", "Unemployment"],
    key_on="feature.id",
    fill_color="YlGn",
    fill_opacity=0.7,
    line_opacity=.1,
    legend_name="Unemployment Rate (%)",
).add_to(m) 

folium.LayerControl().add_to(m)

m
Out[27]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [ ]: